Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Part One: Identifying Bad Visualizations
If you happen to be bored and looking for a sensible chuckle, you should check out these Bad Visualisations. Looking through these is also a good exercise in cataloging what makes a visualization good or bad.
Dissecting a Bad Visualization
Below is an example of a less-than-ideal visualization from the collection linked above. It comes to us from data provided for the Wellcome Global Monitor 2018 report by the Gallup World Poll:
While there are certainly issues with this image, do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
The graph is showing us the percent of people who believe vaccines are safe by country and global region. The authors could be trying to convey regional differences in vaccine trust, outliers, and a central theme that higher-income countries (such as the US) do not always have high trust in vaccines.
List the variables that appear to be displayed in this visualization. Hint: Variables refer to columns in the data.
Country, region, % of trust, median vaccine trust by region.
Now that you’re versed in the grammar of graphics (e.g., ggplot), list the aesthetics used and which variables are mapped to each.
x = % who believe in vaccines
y = country
color = region
What type of graph would you call this? Meaning, what geom would you use to produce this plot?
Because each individual observation is a point, I would call this a geom_point() plot.
Provide at least four problems or changes that would improve this graph. Please format your changes as bullet points!
Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
`.name_repair` is omitted as of tibble 2.0.0.
ℹ Using compatibility `.name_repair`.
# region codesregion_codes_list <- dict$`Variable Type & Codes*`[57] |>str_split(",", simplify =TRUE) |>as_tibble() |>pivot_longer(cols =everything(), names_to =NULL, values_to ="col") |>mutate(col =str_trim(col)) |># trim white spacefilter(col !="") |># filter out blanksseparate_wider_delim("col", delim ="=", names =c("Regions_Report", "region")) |>mutate(region_code =as.integer(str_trim(Regions_Report)),region =str_trim(region),Regions_Report =as.integer(Regions_Report) )# join datafull_data <- data |>left_join(country_codes_list, by ="code") |>left_join(region_codes_list, by ="Regions_Report")
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
safe_vax_pct |>filter(!is.na(region)) |>group_by(region) |>mutate(country =fct_reorder(country, percent_safe), region_median =median(percent_safe, na.rm =TRUE) ) |>ungroup() |>ggplot(aes(x = percent_safe, y = country)) +geom_vline(aes(xintercept = region_median), linetype ="dashed", color ="black") +geom_point(aes(color = region), size =3) +facet_wrap(~ region, scales ="free_y") +scale_color_manual(values =wes_palette("Zissou1", n =6, type ="continuous")) +labs(title ="% of people who believe vaccines are safe, by country and global region",subtitle ="Dark vertical lines represent region medians",x ="% who believe vaccines are safe",y =NULL,caption ="Source: Wellcome Global Monitor, Gallup World Poll 2018" ) +theme_minimal() +theme(plot.title =element_text(face ="bold", size =15),panel.grid.minor =element_blank(),panel.grid.major.y =element_blank(),legend.position ="none" )
For this second plot, you must select a plot that uses maps so you can demonstrate your proficiency with the leaflet package!
Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
I decided to recreate and improve Chart 2.3: Map of perceived knowledge about science by country which is on page 27 of the report. The map shows the perceived knowledge about science of people in different countries. The authors may be trying to convey that peoples confidence in their science knowledge varies across countries. Certain countries may have lower confidence due to a limited access to educational resources, however it is important to note that plots like these can be damaging as viewers might interpret lower metrics as a reflection of a country’s intelligence/worth (when that is simply not true).
List the variables that appear to be displayed in this visualization.
Country
Percent who answered “a lot” or “some”
Surveyed status?
Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
fill for the percents
geometry for country
color for survey status
What type of graph would you call this?
Choropleth map
List all of the problems or things you would improve about this graph.
use different colors for better contrast
hover over to see percents
differentiate NA’s more clearly
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
world <-ne_countries(type ="countries", scale ="small")science_pct <- full_data |>mutate(country =if_else(country =="United States", "United States of America", country)) |>group_by(country) |>summarise(total_strong =sum(Q1 %in%c(1, 2), na.rm =TRUE),total =n(),percent_strong = total_strong / total *100 )map_data <- world |>left_join(science_pct, by =c("name"="country"))qpal <-colorNumeric("YlGnBu", domain = map_data$percent_strong, na.color ="white")leaflet(map_data) |>addTiles() |>addPolygons(stroke =FALSE, smoothFactor =0.2, fillOpacity =1,color =~qpal(percent_strong),label =~paste0(name, ": ", round(percent_strong, 1), "%")) |>addLegend(pal = qpal, values = map_data$percent_strong, title ="Knowledge Level (%)", position ="bottomright")|>addControl(html ="Map of perceived knowledge about science by country",position ="topright" )
Third Data Visualization Improvement
For this third plot, you must use one of the other ggplot2 extension packages mentioned this week (e.g., gganimate, plotly, patchwork, cowplot).
Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
For the second visualization, I chose Chart 3.1: Trust in Scientists Index showing levels of trust by region which is on page 53. The chart shows us the levels in which people trust scientists from different regions. The authors may be trying to display where certain regions may trust scientists less in order to identify where resources/policy changes could be needed.
List the variables that appear to be displayed in this visualization.
Region
Trust level
Percent for each level within each region
Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
y = region
x = percent
fill = trust level
What type of graph would you call this?
This is a stacked bar chart.
List all of the problems or things you would improve about this graph.
colors are not intuitive
no clear sorting
fonts are too small
too crowed
could use interactive elements
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.